The TextPro Tool Suite

نویسندگان

  • Emanuele Pianta
  • Christian Girardi
  • Roberto Zanoli
چکیده

We present TextPro, a suite of modular Natural Language Processing (NLP) tools for analysis of Italian and English texts. The suite has been designed so as to integrate and reuse state of the art NLP components developed by researchers at FBK. The current version of the tool suite provides functions ranging from tokenization to chunking and Named Entity Recognition (NER). The system‟s architecture is organized as a pipeline of processors wherein each stage accepts data from an initial input or from an output of a previous stage, executes a specific task, and sends the resulting data to the next stage, or to the output of the pipeline. TextPro performed the best on the task of Italian NER and Italian PoS Tagging at EVALITA 2007. When tested on a number of other standard English benchmarks, TextPro confirms that it performs as state of the art system. Distributions for Linux, Solaris and Windows are available, for both research and commercial purposes. A web-service version of the system is under development.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Named Entity Extraction from Speech: Approach and Results Using the Textpro System

This paper describes the application of the TextPro system to the task of recognition of named entities in speech. TextPro is a lightweight engine for interpreting cascaded finite-state transducers. Although originally intended for processing text, the experience of this evaluation demonstrates the system can easily be adapted to processing transcripts generated by a speech recognizer as well. ...

متن کامل

VenPro: A Morphological Analyzer for Venetan

This document reports the process of extending MorphoPro for Venetan, a lesser-used language spoken in the Nort-Eastern part of Italy. MorphoPro is the morphological component of TextPro, a suite of tools oriented towards a number of NLP tasks. In order to extend this component to Venetan, we developed a declarative representation of the morphological knowledge necessary to analyze and synthesi...

متن کامل

Entity Mention Detection using a Combination of Redundancy-Driven Classifiers

We present an experimental framework for Entity Mention Detection in which two different classifiers are combined to exploit Data Redundancy attained through the annotation of a large text corpus, as well as a number of Patterns extracted automatically from the same corpus. In order to recognize proper name, nominal, and pronominal mentions we not only exploit the information given by mentions ...

متن کامل

An analytical model based on simulation aiming to improve patient flow in a hospital surgical suite

Surgical suits allocate a large amount of expenses to hospitals; on the other hand, they constitute a huge part of hospital revenues. Patient flow optimization in a surgical suite by omitting or reducing bottlenecks which cause loss of time is one of the key solutions in minimizing the patients’ length of stay[1] (LOS) in the system, lowering the expenses, increasing efficiency, and also enhanc...

متن کامل

TextPro-AL: An Active Learning Platform for Flexible and Efficient Production of Training Data for NLP Tasks

This paper presents TEXTPRO-AL (Active Learning for Text Processing), a platform where human annotators can efficiently work to produce high quality training data for new domains and new languages exploiting Active Learning methodologies. TEXTPRO-AL is a web-based application integrating four components: a machine learning based NLP pipeline, an annotation editor for task definition and text an...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008